Skip to content

Performance improvements to CEL rule evaluation#460

Merged
stefanvanburen merged 3 commits intomainfrom
svanburen/perf-improvements
Apr 24, 2026
Merged

Performance improvements to CEL rule evaluation#460
stefanvanburen merged 3 commits intomainfrom
svanburen/perf-improvements

Conversation

@stefanvanburen
Copy link
Copy Markdown
Member

Three improvements to hot-path validation, inspired by protovalidate-go.

Guard _validate_cel when self._cel is empty (bufbuild/protovalidate-go#261)

Adds an early return in CelRules._validate_cel when there are no CEL runners, skipping activation dict creation and datetime.now() entirely. Also guards the _msg_to_cel call in MessageRules.validate() so the message-to-CEL conversion is skipped when there are no CEL rules to run. Benchmark (required-only field, 0 CEL runners): -37%.

Skip now computation when unused (bufbuild/protovalidate-go#289)

Adds _uses_now to CelRules, set at compile time in add_rule by checking whether "now" appears in the expression string. Only timestamp.gt_now, timestamp.lt_now, timestamp.within, and custom expressions referencing now will call datetime.datetime.now(). Benchmarks: int32.gt (5 CEL runners) -25%, timestamp.gt_now -6%.

Early-exit loop in cel_unique (bufbuild/protovalidate-go#289)

Replaces len(val) == len(set(val)) with a loop that returns False on the first duplicate, avoiding building the full set unnecessarily.

@stefanvanburen stefanvanburen requested review from a team, anuraaga and timostamm April 24, 2026 13:35
Comment thread protovalidate/internal/extra_func.py
@stefanvanburen stefanvanburen force-pushed the svanburen/perf-improvements branch from 26bab39 to 232d6a5 Compare April 24, 2026 13:48
rules = validate_pb2.Rule()
rules.id = expression
rules.expression = expression
if "now" in rules.expression:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a string search, correct? Could there be false positives? Do they matter?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, you're right. probably not a huge deal (since it's just false positives rather than a correctness issue), but moved over to a regex in f6b1c39. wydt?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have benchmarks on the difference? You'd need a rule that had now used for time and a rule with now used as a substring and compare.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that the string search is fine, but that's just a guess. Maybe there are a large number of rules with the substring "now" that aren't doing time comparisons.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the regex ought to be slower; reverted in 6071c3f. I think some false-positives are fine.

Three improvements to hot-path validation, inspired by protovalidate-go.

Guard `_validate_cel` when `self._cel` is empty (bufbuild/protovalidate-go#261)

Adds an early return in `CelRules._validate_cel` when there are no CEL
runners, skipping activation dict creation and `datetime.now()` entirely.
Also guards the `_msg_to_cel` call in `MessageRules.validate()` so the
message-to-CEL conversion is skipped when there are no CEL rules to run.
Benchmark (required-only field, 0 CEL runners): -37%.

Skip `now` computation when unused (bufbuild/protovalidate-go#289)

Adds `_uses_now` to `CelRules`, set at compile time in `add_rule` by
checking whether `"now"` appears in the expression string. Only
`timestamp.gt_now`, `timestamp.lt_now`, `timestamp.within`, and custom
expressions referencing `now` will call `datetime.datetime.now()`.
Benchmarks: `int32.gt` (5 CEL runners) -25%, `timestamp.gt_now` -6%.

Early-exit loop in `cel_unique` (bufbuild/protovalidate-go#289)

Replaces `len(val) == len(set(val))` with a loop that returns `False` on
the first duplicate, avoiding building the full set unnecessarily.
The simple `"now" in expression` substring check could produce false
positives for identifiers like `renown` or `knows`, setting `_uses_now`
unnecessarily. Replace with `_NOW_RE = re.compile(r"\bnow\b")` so only
the standalone identifier is matched.

The `\b` assertion matches at the boundary between word characters
(`[a-zA-Z0-9_]`) and non-word characters. Since CEL identifiers are
`[_a-zA-Z][_a-zA-Z0-9]*`, any occurrence of `now` as an identifier —
including inside expressions like `timestamp(now)` — is always flanked
by non-identifier characters, so this approach produces no false
negatives.
@stefanvanburen stefanvanburen force-pushed the svanburen/perf-improvements branch from 232d6a5 to f6b1c39 Compare April 24, 2026 14:13
@stefanvanburen
Copy link
Copy Markdown
Member Author

@jonbodner-buf going to merge since you approved the first commit and just reverted the second; thanks for taking a look!

@stefanvanburen stefanvanburen merged commit 7aae31b into main Apr 24, 2026
12 checks passed
@stefanvanburen stefanvanburen deleted the svanburen/perf-improvements branch April 24, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants